Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[11.x] Handle circular references in model serialization #52461

Merged

Conversation

samlev
Copy link
Contributor

@samlev samlev commented Aug 13, 2024

This is a companion PR to #51582 which resolves the issues raised around serialization of models that have circular references in relations.

TL;DR:

This PR makes some minor changes to how the main recursive operations on Models work to prevent getting into infintite recursion down a stack of circular references, which leads to a stack overflow. It does this by preventing recursive calls to the same method on an object.

What are circular references?

Put simply - circular references occur when two objects hold a reference to each other. In the context of this PR, it's when a model holds a relationship to another model which in turn holds a relationship back to the first instance. For example, if you had a user with multiple posts, and you wanted to give each post a reference back to the original user, you might do something like this:

$user->posts->each->setRelation('user', $user);

This would mean that each post has a link back to the original user, not to another copy of the user.

Why would you want to do this?

There are many reasons, but the most common reason that I encounter is authorization. What if you had a policy similar to this for posts:

class PostPolicy
{
    // ...
    public function view(User $user, Post $post): bool
    {
        return $post->published_at?->greaterThan(now()) || $post->user->is($user);
    }
    //...
}

Note that the post references it's own copy of the user object. When you're looping through posts on an index page you can easily eager-load the user (Post::with(['user'])->get()), but if you're pulling the posts for a particular user, you either have to:

  • Eager-load the user on the posts ($user->load(['posts.user']);) which leaves you with multiple copies of the user that you already had.
  • Deal with N+1 queries as each post runs a query to pull out its user - again, something that you had already pulled out of the database.
  • Set the relation to the existing model, and end up with a circular reference.

Setting the relation is the most memory efficient, database efficient, and has some other benefits like being able to naviagate back and forth between the user and posts without generating more queries, e.g.

<!-- user/post-index.blade.php -->
@foreach($user->posts as $idx => $post)
  <x-post :$post :position="$idx + 1" />
@endforeach

<!-- components/post.blade.php -->
<a href="{{ route('user.posts.edit', [$post->user, $post]) }}">{{ $post->title }}</a>
({{ $position }} / {{ $post->user->posts->count() }})

How does this cause problems?

For the most part, it doesn't. I've used this pattern many times on many projects, and PR #51582 would make it possible to set the inverse relation automatically in a number of situations.

The issue is when it comes to serializing a model that has circular relations. When you call toJson() or toArray() on a model with circular relations, or you attempt to send that model to a queue, or you simply want to just call push() to save the model and any of it's dependent children. Essentially any time where it tries to walk down the stack of relations recursively, you'll end up in this loop:

>>> dump($user->toArray());
[
    'id' => 1,
    'posts' => [
        0 => [
            'id' => 1234,
            'user' => [
                'id' => 1,
                'posts' => [
                    0 => [
                        'id' => 1234,
                        'user' => [
                            'id' => 1,
                            'posts' => [// ... all the way to a stack overflow

I don't know how many people have experienced a SegFault in PHP, but it's... not a fun issue to diagnose.

How does this PR fix that issue?

This adds a new internal method to models called once() withoutRecursion(). This keeps track of the call stack and will prevent the same method from being called on the same instance of an object more than once within that stack. So instead of that call stack above, you end up with something like this:

>>> dump($user->toArray());
[
    'id' => 1,
    'posts' => [
        0 => [
            'id' => 1234,
            'user' => [
                'id' => 1,
            ],
        ],
        1 => [
            'id' => 2345,
            'user' => [
                'id' => 1,
            ],
        ],
        2 => [
            'id' => 3456,
            'user' => [
                'id' => 1,
            ],
        ],
    ]
];

Because toArray() was called on $user at the top level, it doesn't get called again, so it doesn't dig down again, but because each post is a unique object, toArray() can get called on each of them.

A Couple Of Questions That You Might Want Answered

Is this backwards compatible?

Yes. This won't break any existing code because any code that would have encoutnered this problem wouldn't be working. The only concerns are if someone has already defined a once() method on a model (in which case I'm happy to change this method name if you'd prefer - I was just keeping consistency with the other existing functionality for the once() helper).

Why not use the once() helper?

For two major reasons:

  1. It's not recursion safe

It expects a call to complete so that it can store the value before it can prevent subsequent calls to the method. This means that if the method is recursive, it will never complete. The change that I've made here accepts a "default" value which will be stored as the return value before the method is called, meaning that any recursion will return early with the default value instead of continuing to dig itself deeper.

  1. once() would keep the result for the rest of the request, even if the model changed

We don't want to prevent ->toArray() from changing its result if the models or relations change. Each time it gets called we want to actually pull the result at that point in time. We only want the "once" behaviour to persist within the context of that one call stack. This is particularly important for ->push() - you don't want to store the result of $model->push() and never actually call it again.

Has anyone other than you even encoutnered this?

Yep! @reinink has talked about Optimizing circular relationships in Laravel, @stancl made a package to add hasManyWithInverse(), and @stayallive extended that, with an specific note about dealing with recursion.

Even if the inverse() pull request never gets merged, I believe that this would still be a valuable fix for models.

Copy link

Thanks for submitting a PR!

Note that draft PR's are not reviewed. If you would like a review, please mark your pull request as ready for review in the GitHub user interface.

Pull requests that are abandoned in draft may be closed due to inactivity.

@samlev samlev force-pushed the bugfix/11.x_handle_recursion_in_to_array branch from 8e827b8 to 396b9ce Compare August 13, 2024 00:54
If a circular relationship is set up between two models using
 `setRelation()` (or similar methods) then calling
 `$model->relationsToArray()` will call `toArray()` on each related
 model, which will in turn call `relationsToArray()`. In an instance
 where one of the related models is an object that has already had
 `toArray()` called further up the stack, it will infinitely recurse
  down and result in a stack overflow.

The same issue exists with `getQueueableRelations()`, `push()`, and
 potentially other methods. This adds tests which will fail if one of
 the known potentially problematic methods gets into a recursive loop.
@samlev samlev force-pushed the bugfix/11.x_handle_recursion_in_to_array branch 3 times, most recently from 2e16f51 to c29aa6b Compare August 13, 2024 02:02
This adds a trait for Eloquent which can be used to prevent recursively
 serializing circular references.
@samlev samlev force-pushed the bugfix/11.x_handle_recursion_in_to_array branch from c29aa6b to 1e175f4 Compare August 13, 2024 02:08
@samlev samlev marked this pull request as ready for review August 13, 2024 02:11
@samlev samlev force-pushed the bugfix/11.x_handle_recursion_in_to_array branch from 1cb4499 to 05cdf3b Compare August 13, 2024 09:48
@Jubeki
Copy link
Contributor

Jubeki commented Aug 19, 2024

I like this PR in general!

Are there performance / memory implications with these changes? Maybe some benchmarks would be great with the before and after comparison.

@samlev
Copy link
Contributor Author

samlev commented Aug 19, 2024

Are there performance / memory implications with these changes? Maybe some benchmarks would be great with the before and after comparison.

That's a very difficult question to answer because the "before" case that this PR is targeting would have resulted in a stack overflow, so the test would be "it works at all" vs "Segmentation Fault".

In terms of non-circular references the heaviest memory implication would be temporarily holding on to a second copy of a model's attributes only on the toArray() method, but that would be cleared as soon as the toArray() for a specific model finishes. If there was an initial "working" implementation that didn't cause a segfault, this implementation would use less processing/memory because every time toArray() was called on a model recursively it would return the cached copy instead of re-calculating it again... But again, that's entirely theoretical because that case would never work originally.

My understanding of WeakMap in general won't really use much memory because it only actually holds a weak reference to an object that already exists in memory (so essentially just a pointer), instead of a clone of the object. Effectively the only extra memory that it uses would be the cached value, which is always a Boolean true for push(), and an empty array for getQueueableRelations(). toArray() is the only method that stores a significant payload in the cache, and the finally block should always clear that up, even if an exception was thrown somewhere in the call.

The inverse() PR (#51582) that this PR is a companion to would result in lower memory usage and fewer DB calls in a number of situations.

@Jubeki
Copy link
Contributor

Jubeki commented Aug 19, 2024

That's a very difficult question to answer because the "before" case that this PR is targeting would have resulted in a stack overflow, so the test would be "it works at all" vs "Segmentation Fault".

Yeah, I meant regarding a non-circular toArray call, because it also uses the without Recursion part.

@samlev
Copy link
Contributor Author

samlev commented Aug 19, 2024

Yeah, I meant regarding a non-circular toArray call, because it also uses the without Recursion part.

So there would be extra memory usage due to attributesToArray() being cached in case there's recursion. I could mitigate against it but it would only add extra complexity to address something that is probably not going to cause major issues for the majority of use cases. I'm already not 100% happy with the current increase to complexity for toArray(), but I'm hedging memory usage against processing time in response to @Tofandel 's review. If it's a problem for specific users, they can always swap out toArray() on affected models.

@taylorotwell
Copy link
Member

taylorotwell commented Aug 22, 2024

@samlev really dig the PR, but unfortunately it is a pretty big hit on performance. For example, seed 1,000 users into the database and return User::all()->toArray() from a route. That toArray call is 2x slower on this PR compared to current 11.x branch. On my machine, I jump from roughly 40-50ms to about 100ms on this PR.

use App\Models\User;
use Illuminate\Support\Benchmark;

$users = User::all();

Benchmark::dd(fn () => $users->toArray());

@samlev
Copy link
Contributor Author

samlev commented Aug 22, 2024

@taylorotwell There's definitely some ways to clean it up more. I'm guessing that the speed issue is the additional call to attributesToArray() which could actually be done as an "on demand"/"when needed" - I was just trying to avoid adding even more complexity.

@taylorotwell
Copy link
Member

@samlev we may just have to give the extra complexity a shot - hopefully won't be too bad. I don't think we can stomach a 2x performance hit for all Eloquent serialization.

@samlev samlev force-pushed the bugfix/11.x_handle_recursion_in_to_array branch from d51bc1e to 8982c36 Compare August 22, 2024 04:47
@samlev
Copy link
Contributor Author

samlev commented Aug 22, 2024

@taylorotwell this new update should negate the performance penalty for non-circular relations (and should more or less resolve @Jubeki 's concerns about increased memory usage).

The only other "improvement" that I can think of would be to get extremely meta by forcing toArray() to be called recursively:

public function toArray()
{
    return $this->withoutRecursion(
        fn () => array_merge($this->toArray(), $this->relationsToArray()),
        fn () => $this->attributesToArray(),
    );
}

which would have the effect of only calling attributesToArray() once to fulfill the second toArray() call and give the default value. So the call stack would go something like this (visualising recursive call stacks in text is hard...):

toArray() {
  withoutRecursion() {
    toArray() {
      withoutRecursion() {
        attributesToArray();
      }
    }
    relationsToArray();
  }
}

The end result, though, is that attributesToArray() would be called only once, and cached, then merged with relationsToArray() which would (ultimately) use the cached copy of attributesToArray() if it does actually have circular references.

@taylorotwell
Copy link
Member

@samlev Indeed I don't see the performance regression anymore. Nice! Will do a final review in the morning and try to get this merged. 👍

@rodrigopedra
Copy link
Contributor

rodrigopedra commented Aug 22, 2024

Good to see this being handled, but isn't using a flag a better approach? Regarding performance?

Something like this:

<?php

namespace App\Models;

use Illuminate\Database\Eloquent\Model;
use Illuminate\Database\Eloquent\Relations\BelongsTo;
use Illuminate\Database\Eloquent\Relations\HasMany;

class Item extends Model
{
    // recursion flag
    protected $visited = false;

    public function item(): BelongsTo
    {
        return $this->belongsTo(Item::class);
    }

    public function items(): HasMany
    {
        return $this->hasMany(Item::class);
    }

    public function toArray(): array
    {
        if ($this->visited) {
            // TODO: improve the fallback case, maybe an array 
            // with a single marker value to filter out later?
            return [];
        }

        $this->visited = true;

        try {
            return parent::toArray();
        } finally {
            $this->visited = false;
        }
    }
}

With this test code:

<?php

use App\Models\Item;
use Illuminate\Support\Facades\Artisan;

Artisan::command('circular', function () {
    $parent = Item::query()->create();
    $parent->items()->create();
    $parent->items()->create();
    $parent->items()->create();

    $parent->load(['items']);

    // set parent instance manually to create the circular reference
    $parent->items->each->setRelation('item', $parent);

    $this->info($parent->toJson(\JSON_PRETTY_PRINT));

    // print twice to show the visited property is cleared nicely
    $this->info($parent->toJson(\JSON_PRETTY_PRINT));
});

Yields this output:

$ php artisan circular
{
    "updated_at": "2024-08-22T05:28:24.000000Z",
    "created_at": "2024-08-22T05:28:24.000000Z",
    "id": 1,
    "items": [
        {
            "id": 2,
            "item_id": 1,
            "created_at": "2024-08-22T05:28:24.000000Z",
            "updated_at": "2024-08-22T05:28:24.000000Z",
            "item": []
        },
        {
            "id": 3,
            "item_id": 1,
            "created_at": "2024-08-22T05:28:24.000000Z",
            "updated_at": "2024-08-22T05:28:24.000000Z",
            "item": []
        },
        {
            "id": 4,
            "item_id": 1,
            "created_at": "2024-08-22T05:28:24.000000Z",
            "updated_at": "2024-08-22T05:28:24.000000Z",
            "item": []
        }
    ]
}
{
    "updated_at": "2024-08-22T05:28:24.000000Z",
    "created_at": "2024-08-22T05:28:24.000000Z",
    "id": 1,
    "items": [
        {
            "id": 2,
            "item_id": 1,
            "created_at": "2024-08-22T05:28:24.000000Z",
            "updated_at": "2024-08-22T05:28:24.000000Z",
            "item": []
        },
        {
            "id": 3,
            "item_id": 1,
            "created_at": "2024-08-22T05:28:24.000000Z",
            "updated_at": "2024-08-22T05:28:24.000000Z",
            "item": []
        },
        {
            "id": 4,
            "item_id": 1,
            "created_at": "2024-08-22T05:28:24.000000Z",
            "updated_at": "2024-08-22T05:28:24.000000Z",
            "item": []
        }
    ]
}

Of course, there is this dangling array on the child elements' parent references, but this is just a proof of concept.

I thought on hacking with the Model@relationsToArray() method to filter them out, but didn't want to invest time on it before asking for feedback.


EDIT: without the overridden Item@toArray() method, the command above segfaults.

@rodrigopedra
Copy link
Contributor

Yes, adding a marker will remove the dangling ones:

<?php

namespace App;

enum Visited
{
    case VISITED;
}

Changed model to account for the marker:

<?php

namespace App\Models;

use App\Visited;
use Illuminate\Database\Eloquent\Model;
use Illuminate\Database\Eloquent\Relations\BelongsTo;
use Illuminate\Database\Eloquent\Relations\HasMany;

class Item extends Model
{
    // recursion flag
    protected $visited = false;

    public function item(): BelongsTo
    {
        return $this->belongsTo(Item::class);
    }

    public function items(): HasMany
    {
        return $this->hasMany(Item::class);
    }

    public function toArray(): array
    {
        if ($this->visited) {
            return [Visited::VISITED];
        }

        $this->visited = true;

        try {
            return parent::toArray();
        } finally {
            $this->visited = false;
        }
    }

    public function relationsToArray(): array
    {
        // filters out visited relations
        return \array_filter(parent::relationsToArray(), fn ($item) => $item !== [Visited::VISITED]);
    }
}

Updated output (with a single toJson() call):

$ php artisan circular
{
    "updated_at": "2024-08-22T05:43:34.000000Z",
    "created_at": "2024-08-22T05:43:34.000000Z",
    "id": 1,
    "items": [
        {
            "id": 2,
            "item_id": 1,
            "created_at": "2024-08-22T05:43:34.000000Z",
            "updated_at": "2024-08-22T05:43:34.000000Z"
        },
        {
            "id": 3,
            "item_id": 1,
            "created_at": "2024-08-22T05:43:34.000000Z",
            "updated_at": "2024-08-22T05:43:34.000000Z"
        },
        {
            "id": 4,
            "item_id": 1,
            "created_at": "2024-08-22T05:43:35.000000Z",
            "updated_at": "2024-08-22T05:43:35.000000Z"
        }
    ]
}

Of course, instead of overriding those methods on user land, we can consider incorporating them upstream.

@rodrigopedra
Copy link
Contributor

Just for easiness to reproduce (e.g. copy and paste), here is the migration I used:

<?php

use App\Models\Item;
use Illuminate\Database\Migrations\Migration;
use Illuminate\Database\Schema\Blueprint;
use Illuminate\Support\Facades\Schema;

return new class extends Migration
{
    public function up(): void
    {
        Schema::create('items', function (Blueprint $table) {
            $table->id();
            $table->foreignIdFor(Item::class)->nullable()->constrained();
            $table->timestamps();
        });
    }

    public function down(): void
    {
        Schema::dropIfExists('items');
    }
};

@rodrigopedra
Copy link
Contributor

Reproducible repo: https://github.com/rodrigopedra/circular

P.S.: moved overrides to a trait

@samlev
Copy link
Contributor Author

samlev commented Aug 22, 2024

@rodrigopedra the minor difference between the two implementations here is that the nested children will still have the relation, they just won't have the relation of the relations.

So with your example (timestamps removed for brevity):

{
    "id": 1,
    "items": [
        {
            "id": 2,
            "item_id": 1
        },
        {
            "id": 3,
            "item_id": 1
        },
        {
            "id": 4,
            "item_id": 1
        }
    ]
}

The implementation in this PR would still have the "item" on each of the children:

{
    "id": 1,
    "items": [
        {
            "id": 2,
            "item_id": 1,
            "item": {
                "id": 1
            }
        },
        {
            "id": 3,
            "item_id": 1,
            "item": {
                "id": 1
            }
        },
        {
            "id": 4,
            "item_id": 1,
            "item": {
                "id": 1
            }
        }
    ]
}

This doesn't seem super important, but it's more "accurate" as to the actual state of each object, and more consistent with the result if you had eager-loaded the relations (e.g. Item::with('items.item')).

There's more to it, but we also have the potential for a method that causes recursion to call another method that might cause recursion, and we don't want to prevent that second call from happening but we still want to be sure that everything is cleaned up after any individual call to toArray() or push() or getQueueableRelations(). Adding it in this way instead of an individual flag opens up the API to userland to do things which may have otherwise been considered "unsafe" - it's a protected method that people could use themselves in their models when doing something like custom eloquent attributes, etc.

Doing it with a static weak map also opens up possibilities with inspecting levels of recursion (e.g. static::getRecursionCache()->count() would tell you how many individual objects of the current class are currently in play in the current call stack - something that might be useful for diagnostics or observability, or even just limiting the size of the call stack). WeakMaps "clean up" after themselves - or more to the point they don't prevent objects from being garbage collected - so we also don't really have a risk of orphaned records, or holding on to extra memory for any longer than we need to.

Ultimately, this PR isn't just about toArray() - it's resolving an issue that I've personally encounterd a few times, and other people have too, often in situations where they didn't expect to have a "gotcha".

@donnysim
Copy link
Contributor

donnysim commented Aug 22, 2024

@rodrigopedra yours has an edge case when the relation collection is shared while current PR does not:

$sharedItem = new Item(['name' => 'shared']);

$itemA = new Item(['name' => 'a']);
$itemA->setRelation('shared', $sharedItem);
$itemB = new Item(['name' => 'b']);
$itemB->setRelation('shared', $sharedItem);

$items = Collection::make([$itemA, $itemB]);

$parent = new Item(['name' => 'parent']);
$parent->setRelation('items', $items);

$sharedItem->setRelation('items', $items);

dd($parent->toArray());

Output (dd because you cannot json serialize non-backed enum and it crashes):

array:2 [
  "name" => "parent"
  "items" => array:2 [
    0 => array:2 [
      "name" => "a"
      "shared" => array:2 [
        "name" => "shared"
        "items" => array:2 [
          0 => array:1 [
            0 => App\Visited {#1671
              +name: "VISITED"
            }
          ]
          1 => array:1 [
            "name" => "b"
          ]
        ]
      ]
    ]
    1 => array:2 [
      "name" => "b"
      "shared" => array:2 [
        "name" => "shared"
        "items" => array:2 [
          0 => array:1 [
            "name" => "a"
          ]
          1 => array:1 [
            0 => App\Visited {#1671}
          ]
        ]
      ]
    ]
  ]
]

@rodrigopedra
Copy link
Contributor

rodrigopedra commented Aug 22, 2024

@samlev thanks, I noticed that when looking at the tests. And you are right, that'd be more accurate.

@donnysim while digging into the PR's code, I moved away from using this marker. But, thank you so much for reviewing it, and for the insights.

@samlev I updated my test repo, and also created a fork to use the same test set as yours (Minus the test cases that, I commented out that I don't quite get the assertion being 0):

11.x...rodrigopedra:framework:11.x

I didn't want to push a new PR, as I don't want to overtake this one.

I understood from your test cases that my alternative implementation was missing the mixed function recursion. So I incorporated the stack trace you have in yours.

Can you review it? I believe it is a simpler approach, and avoids a static global WeakMap.

Nonetheless, thank you very much for this PR, I had to deal with manually unsetting relations on many projects before.

@rodrigopedra
Copy link
Contributor

@donnysim, on my POC repository, I added a second command to account for the use case you mentioned.

Follow its README and run php artisan shared

Thanks again for the insights.

@Tofandel
Copy link
Contributor

Tofandel commented Aug 22, 2024

The big problem with the flag is if you call the method multiple times after having changed attributes (or even without) for example, then it will not output the correct array, which in theory the withoutRecursion doesn't have

It's a bit of a functional programming vs OOP

@rodrigopedra
Copy link
Contributor

rodrigopedra commented Aug 22, 2024

@Tofandel as an internal/protected property, that would only happen if someone overrides it manually in an unintended way.

Yet, the same can happen with the static cache. One could just assign a new WeakMap instance to the cache and blow things up.

When I saw the tests I moved away to a single flag, due to the mixed recursive method call, as a single flag prevented that use case.

(check the DatabaseConcernsPreventsCircularRecursionTest and look for the PreventsCircularRecursionWithRecursiveMethod@callOtherStack method)

But I'd still go for a per-instance stack than a global WeakMap. That is just as a matter of preference, the current state of this PR is a great addition, IMO.


EDIT: on the first iteration of the flag-based alternative, the sample command calls the recursive method twice, without any issues.

@Tofandel
Copy link
Contributor

Tofandel commented Aug 22, 2024

I was talking about your original implementation in comments, your new implementation is almost the same as the one proposed in this PR and uses a local Map of the callstack as well which doesn't have this issue

I do prefer when things are not static as well (because static always causes unexpected issues when callbacks are involved eg #51825), so if your solution manages all the test cases of this PR and has no additional performance issue, then it's better

@rodrigopedra
Copy link
Contributor

@Tofandel it has a different approach for not caching the first ever $default value.

On this PR's implementation, the first ever $default value is cached for future reference, on mine it uses the then current value from the next recursive call.

But nonetheless, apart from the cross recursive calls between different functions, which clearly don't work with the single flag approach, I wonder how keeping a map of call stack hashes is much different than the flag approach, regarding your functional vs OOP comment.

I moved away from the single flag implementation to accommodate the cross recursive calls for different functions. Although I think that for the serialization use case it would be totally fine, as none of the currently affected methods reference each other.

It is ok if you don't want to stretch this further, and thanks again for reviewing my code =)

@taylorotwell taylorotwell merged commit 2835e2b into laravel:11.x Aug 22, 2024
29 checks passed
@taylorotwell
Copy link
Member

Thanks @samlev - really well done PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants